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Research  Accomplishments 

Below,  we  describe  the  various  accomplishment  of  our  work  under  grant  FA9550-11-1-0312. 
We  start  by  describing  a  general  formulation,  and  then  we  describe  the  various  results  at¬ 
tained. 

1  Mathematical  description  of  Distributed  Decision  Network  un¬ 
der  Information  Constraints 

We  now  define  a  mathematical  framework  for  networks.  Let  Q  =  (V,  E)  be  an  undirected 
random  network  (graph)  drawn  from  a  known  distribution  pg,1  composed  of  a  finite  vertex 
set  V  and  a  link  set  E  =  {V  x  V  modulo  S'}  where  S  =  {{(i,j),  (j,  i)}}.  Each  vertex 

1  G  V  corresponds  to  an  agent,  and  each  link  (j,i)  G  E  corresponds  to  a  channel  by  which 
information  flows  from  agent  j  to  agent  i  in  the  network.  We  denote  the  neighborhood  of  i 
by  A f{i)  =  {j  |  (i,j)  G  E}. 

There  is  a  state  (internal  or  external)  W  drawn  from  a  distribution  pw  that  the  agents 
may  want  to  estimate,  transmit,  and  act  upon.  Each  agent  i  also  possesses  a  state  and  some 
private  observation  about  W.  We  denote  the  state  at  time  t  by  Xi(t),  and  we  assume  that 
the  tuple  of  initial  states  (xj(0))  is  correlated  with  W  and  are  drawn  randomly  from  a  joint 
distribution  pwx0 •  Agent  i’s  private  information/observation  at  time  t  is  denoted  by  Yj(f) 
and  has  a  joint  distribution  pwY,(t.)  with  W.  Finally,  agent  i  has  some  information  about 
agent  j’s  state  (either  because  it  can  observe  it  or  agent  j  transmits  it),  which  we  denote  by 

rriji(t )  =  rriji(xj(t)). 

The  agents  autonomously  update  their  states  according  to  the  dynamics 

Xi(t  +  1)  =  ft  (. Xi(t ),  {nrijiit)}^ Arp),  Yi(t))  .  (1) 

The  performance  of  the  system,  denoted  as  J  ({/,:},  {m^},  {pwYi},Pwx 0),  is  governed  by 
the  information  {pwYt }  and  the  dynamics/algorithms  {/*}  and  {m^},  and  so  we  can  consider 
optimizing  the  system’s  performance  over  these  parameters.  To  this  end,  we  consider  a  class 
of  information  types  {pwy,  }  G  V  over  which  the  information  can  take  its  distribution  as 
well  as  a  set  of  dynamics/algorithms  {/j}  G  J-  and  messages/information  {rriJt}  G  Ai  that 
appropriately  constrain  the  dynamics  of  the  system. 

2  Static  Decision  Networks  under  Communication  Constraints 

Below,  we  summarize  known  research  and  our  main  contributions  in  the  static  problem. 

2.1  Hypothesis  Testing  under  Capacity  Constraints 

One  of  the  simplest  decision  systems  one  can  consider  is  binary  hypothesis  testing,  where  a 
decision  between  two  hypothesis  is  made  using  observations.  This  fundamental  problem  has 
application  in,  for  example,  target  identification  and  multi-mode  systems  identification. 

1  At  times,  we  will  be  interested  in  analyzing  the  performance  of  the  network  with  respect  to  broad 
properties  determined  by  pg  while  at  other  times  we  will  be  interested  in  an  analysis  with  respect  to  properties 
of  a  specific  graph  G.  In  the  latter  case,  we  simply  set  pg  to  the  degenerate  distribution  pg{G)  =  1  g=G- 
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In  this  problem,  the  vertex  set  V  is  a  single  agent  gathering  information  Y  about  a 
random  variable  X  through  a  channel  of  fixed  capacity  c.  The  state  of  the  world  W  can 
assume  two  discrete  states  W\  and  W2  with  a  known  probabilities,  and  the  distribution  of  X 
depends  on  IT;  that  is,  X  ~  pt  if  IT  =  wy  where  pt  belongs  to  the  n-th  dimensional  simplex. 
The  information  (Y(t))t  gathered  is  a  sequence  of  i.i.d  samples  which  have  been  filtered 
through  the  channel  Pwy  and  whose  alphabet  has  a  size  k  that  can  be  chosen  arbitrarily. 

In  this  setting,  we  use  the  traditional  Shannon  channel  capacity  parametrization  of  im¬ 
perfect  information.  Specifically,  we  use  sets  V(c)  parametrized  by  a  capacity  c,  according 
to  V(c)  =  {pxY  |  ma xPY  I(X;Y)  <  c}.  We  also  use  the  following  approximation  valid  for 
small  capacities  derived  in  [?]: 

V\c,p0)  =  jpxr  |  ^\\py\x=i  -Poll^-1]  <  C  i  =  1, . . .  ,n|  , 

where  the  capacity  constraint  is  replaced  by  n  quadratic  inequalities  and  the  output  proba¬ 
bility  distribution  lies  around  po- 

In  this  framework,  the  agent’s  performance  is  naturally  based  on  the  probability  of  error; 
that  is,  declaring  “IT  =  wf  when  the  true  state  of  the  world  is  IT  =  W2  and  vice  versa.  In  our 
case,  rather  than  measure  performance  by  the  minimum  number  of  samples  required  to  make 
a  decision  with  a  fixed  probability  of  error  (quickest  detection),  we  focus  our  attention  on  the 
rate  at  which  the  probability  of  error  decreases  as  the  number  of  samples  collected  increases. 
In  this  setting,  the  relationship  between  information  and  detection  rates  (i.e.,  the  value  of 
information )  is  a  related  to  a  function  C(pi,p2,pxv)  called  the  Chernoff  information.  The 
Chernoff  information  relates  the  asymptotic  behavior  of  the  estimation  to  the  probability  of 
error,  and  for  a  large  number  of  samples  t,  the  probability  of  error  is  written  in  terms  the 
Chernoff  information  as 

P£(t)  =  e-C(pi,P2,PXY)t' 

Therefore,  in  our  case,  we  can  equivalently  express  the  average  performance  of  the  system 
with  a  channel  pxy  as  J{pxy )  —  e~c('Pl,P2,PxY^  so  that  the  optimal  performance  of  such  a 
system  over  all  possible  channels  is  obtained  by  solving  the  optimization  problem 

J(c)=  min  J(pxv )•  (2) 

PXY&V(C ) 

We  can  explicitly  solve  this  problem  in  the  small  capacity  regime  by  substituting  V(c) 
with  V{c)  and  by  using  quadratic  approximation  for  the  Chernoff  Information,  yielding  the 
following  theorem  [?]. 


Theorem  1  (Value  of  information  in  hypothesis  testing)  For  small  capacities  c, 


J(c) 


1  - 


\\Pi-ML 
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The  optimizing  distribution  p*XY  to  optimization  (??)  that  yields  Theorem  ??  also  allows 
us  to  get  an  idea  of  the  actionable  information  in  this  framework.  Let  X{  =  {x\pi(x)  >  P2^x)} 
and  X2  =  {x\p2(x)  >  pi(x)}.  The  optimizing  channel  applies  opposing  weights  to  symbols 
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in  Xi  and  X-2,  and  it  applies  an  arbitrary  weights  to  symbols  not  in  X2  U  X2,  the  so- 
called  inactionable  information  set.  This  is  an  important  point  for  this  application  -  a 
channel  optimized  for  estimation  of  X  is  not  necessarily  the  channel  optimized  for  hypothesis 
estimation. 

We  also  have  the  following  relationship  between  the  size  of  the  output  alphabet  and  the 
channel  capacity. 

Theorem  2  (Information  content  and  latency  in  hypothesis  testing)  Let  k  be  the 

size  of  the  output  alphabet  (i.e.,  Y  e  {yi}i<i<k)  and,  let  pkXY  be  the  optimizing  channel 
for  that  output  alphabet  size.  For  k  >  kmin  =  [ec]  ,  J  ( pkXY )  =  J(PxYn)- 

In  other  words,  for  a  fixed  capacity,  optimizing  over  a  set  of  channels  with  output  alphabet 
greater  than  \ec~\  does  not  improve  performance.  Another  interesting  aspect  of  the  optimizing 
channels  is  that  because  they  are  tuned  to  hypothesis  testing  rather  than  estimation  of  X, 
decoding  information  through  the  channel  is  simple. 

2.2  The  Impact  of  the  Network  Topology  on  Distributed  Hypothesis  Testing 

We  now  move  to  a  type  of  distributed  hypothesis  testing  where  the  agents  seek  to  guess  the 
state  of  the  world  W  through  individual  trials.  However,  rather  than  study  how  the  error 
rate  diminishes  as  an  agent  accumulates  unbounded  information,  we  study  how  information 
flow  through  the  networks  impacts  the  error  rate  over  an  unbounded  chain  of  agents.  Ul¬ 
timately,  we  are  interesting  in  determining  if  the  network  eventually  “guesses”  the  correct 
state  of  the  world,  in  which  case  we  say  that  the  network  “learned.”  Interestingly,  learning 
is  not  guaranteed  in  this  setting.  As  we  will  see,  the  decision  network’s  topology  can  have 
unintended  effects  on  how  errors  propagate  through  the  network. 

To  illustrate  the  theoretical  issues  arising  in  this  context,  let  us  consider  a  learning 
problem  over  a  social  network  of  rational  agents.  The  problem  is  whether  agents  would  be 
able  to  extract  valuable  information  about  an  unobservable  parameter,  simply  by  observing 
the  behavior  of  their  neighbors.  In  other  words,  we  are  interested  in  understanding  which 
network  structures  would  lead  to  learning  the  unknown  parameter  and  which  networks  can 
generate  a  herd  behavior ,  where  in  the  Perfect  Bayesian  Nash  Equilibrium,  agents  can  only 
extract  limited  amount  of  information  from  their  neighbors’  actions. 

As  before,  let  W  denote  an  underlying  state  of  the  world,  unknown  to  the  agents,  and 
suppose  for  simplicity  that  W  takes  two  values  Wl  and  Wh  >  Wl,  and  suppose  that  each 
individual  receives  an  imperfectly  informative  signal  about  the  value  of  W  (also  referred 
to  as  his  private  belief ),  denoted  by  Y,  which  is  identical  and  independently  distributed 
across  individuals.  It  is  common  knowledge  that  the  signal  has  a  conditional  distribution 
p{Y\W)  in  states  W  =  Wh  and  W  =  Wl ■  We  assume  that  the  distributions  p(Y\Wn)  and 
p(Y\Wl)  are  absolutely  continuous  with  respect  to  one  another  and  have  a  common  support 
[a ,  cf]  C  [[0, 1].  We  say  that  the  private  beliefs  are  bounded  if  0  <  a  <  d  <  1;  and  unbounded 
if  [cf,  a]  =  [0, 1].  For  unbounded  beliefs,  signals  in  favor  of  state  Wl  are  more  likely  to  occur 
in  state  Wl  than  in  state  Wh,  he.,  there  is  an  underlying  tendency  for  the  truth  to  be 
revealed  in  the  signals. 
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A  realistic  framework  of  learning  in  a  multi-agent  system  must  model  structure  of  social 
networks  with  which  individuals  observe  and  communicate  with  each  other.2  Thus,  we  as¬ 
sume  that  individuals  form  a  social  network  in  which  each  agent  can  observe  the  actions  of 
(a  subset  of)  other  individuals  that  have  moved  in  the  past.  After  observing  their  private 
signal  (Yi)  and  other  available  information  about  the  actions  of  other  individuals,  each  indi¬ 
vidual  makes  a  decision.  Beliefs  are  formed  in  a  Bayesian  manner  based  on  the  content  of 
the  private  signal  and  the  knowledge  of  what  all  precedents  have  done  (i.e.,  actions  of  people 
who  have  moved  before). 

We  are  interested  in  understanding  what  agents  can  learn  from  one  another  in  the  long 
run,  when  they  update  their  beliefs  in  a  Bayesian  fashion.  To  this  end,  we  develop  a  theory  of 
learning  and  dynamic  belief  formation  when  individuals  observe  not  the  entire  action  history, 
but  rather  the  actions  of  a  neighborhood  of  individuals.  Notice  that  both  the  analysis  and  the 
equilibrium  outcome  are  significantly  different  if  individuals  do  not  observe  all  past  actions, 
but  only  a  subset  of  these  past  actions,  that  may  be  for  example  randomly  chosen  from  the 
entire  set  of  past  actions.  One  difficulty  with  this  class  of  models  is  that  to  determine  how 
beliefs  will  evolve,  we  need  to  characterize  the  perfect  Bayesian  Nash  equilibrium,  which 
involves  rather  complex  inferences  by  individuals. 

In  recent  work  [?],  we  have  developed  a  new  framework  for  learning  dynamics  over  a 
very  general  (deterministic  or  stochastic)  social  network  of  agents  (see  also  related  work  [?] 
and  [?]).  In  particular,  we  consider  a  countably  infinite  number  of  agents,  each  of  which  is 
making  a  decision  xn  sequentially.  We  assume  that  the  neighborhood  of  agent  n,  B(n ),  is 
stochastically  generated  according  to  an  arbitrary  probability  distribution  pg  over  the  set  of 
all  subsets  of  {1, . . . ,  n  —  1}.  The  sequence  {pg}  is  the  network  topology  of  the  social  network 
formed  by  the  agents.  The  network  topology  is  common  knowledge,  whereas  the  realized 
neighborhood  B{n )  is  the  private  information  of  agent  n.  Notice  that  in  the  case  that  B{n ) 
is  a  strict  subset  of  {1, . . . ,  n  —  1 }  for  some  n  >  2,  then  the  social  beliefs  do  not  form  a 
martingale,  and  as  a  result,  one  cannot  apply  Doob’s  martingale  convergence  theorem  in  the 
analysis. 

We  provide  a  systematic  characterization  of  the  conditions  under  which  there  will  be 
equilibrium  information  aggregation  in  social  networks.  We  say  that  there  is  information 
aggregation  or  equivalently  asymptotic  learning ,  when,  in  the  limit  as  the  size  of  the  social 
network  becomes  arbitrarily  large,  individual  actions  converge  (in  probability)  to  the  ac¬ 
tion  that  yields  the  higher  payoff.  The  key  property  of  the  network  topology  relevant  to 
asymptotic  learning  turns  out  to  be  the  expanding  observations  property. 

To  describe  this  concept,  let  us  first  introduce  another  notion:  a  finite  group  of  agents 
is  excessively  influential  if  there  exists  an  infinite  number  of  agents  who,  with  probability 
uniformly  bounded  away  from  0,  observe  only  the  actions  of  a  subset  of  this  group.  For 
example,  a  group  is  excessively  influential  if  it  is  the  source  of  all  information  (except  in¬ 
dividual  signals)  for  an  infinitely  large  component  of  the  social  network.  If  there  exists  an 

2 Although  there  is  a  large  literature  in  economics  on  social  learning  (see  [?],  [?],  [?]),  this  literature 
does  not  focus  on  the  implications  of  the  social  network  topology  and  interaction  structure  on  information 
dissemination  and  belief  formation.  Most  of  the  work  relies  on  the  assumption  of  perfect  observability  of 
the  ordered  history.  Under  this  (implausible)  assumption,  the  posterior  beliefs  form  a  martingale,  which 
significantly  simplifies  the  analysis,  as  one  can  simply  apply  Doob’s  martingale  convergence  theorem. 


4 


excessively  influential  group  of  individuals,  then  the  social  network  has  nonexpanding  ob¬ 
servations,  and  conversely,  if  there  exists  no  excessively  influential  group,  the  network  has 
expanding  observations.  This  definition  implies  that  most  reasonable  social  networks  have 
expanding  observations,  and  in  particular,  a  minimum  amount  of  “arrival  of  new  informa¬ 
tion”  in  the  social  network  is  sufficient  for  the  expanding  observations  property.  For  example, 
the  environment  studied  in  most  of  the  previous  work  in  this  area,  where  all  past  actions  are 
observed,  has  expanding  observations.  Similarly,  a  social  network  in  which  each  individual 
observes  one  uniformly  drawn  individual  from  those  who  have  taken  decisions  in  the  past  or 
a  network  in  which  each  individual  observes  his  immediate  neighbor  all  feature  expanding 
observations.  Note  also  that  a  social  network  with  expanding  observations  need  not  be  con¬ 
nected.  A  simple,  but  typical,  example  of  a  network  with  nonexpanding  observations  is  the 
one  in  which  all  future  individuals  only  observe  the  actions  of  the  first  K  <  oo  agents. 

We  establish  the  following  result  for  the  perfect  Bayesian  Nash  Equilibrium  of  the  learning 
game: 

Theorem  3  (Impact  of  network  topology  on  learning)  If  the  network  topology  is  non¬ 
expanding,  then  there  will  not  be  asymptotic  learning.  Conversely,  if  private  beliefs  are  un¬ 
bounded  and  the  network  topology  is  expanding,  then  there  will  be  asymptotic  learning. 

This  is  a  striking  result  (particularly  if  we  consider  unbounded  beliefs  to  be  a  better 
approximation  to  reality  than  bounded  beliefs),  since,  as  explained  above,  almost  all  reason¬ 
able  social  networks  have  the  expanding  observations  property.  This  theorem,  for  example, 
implies  that  when  some  individuals,  such  as  “informational  leaders,”  are  overrepresented  in 
the  neighborhoods  of  future  agents  (and  are  thus  “influential,”  though  not  excessively  so), 
learning  may  slow  down,  but  asymptotic  learning  will  still  obtain  as  long  as  private  beliefs 
are  unbounded. 


2.3  Value  of  Information  in  Shortest  Path  and  Network  Flow  Optimizations 

We  now  address  the  performance  of  decision  networks  by  considering  the  limitations  of  one 
agent  who  makes  a  single  decision  under  uncertainty  [?].  The  framework  corresponds  to, 
for  example,  a  central  decision  maker  who  obtains  information  from  either  one  or  many 
imperfect  distributed  sensors,  and,  hence,  has  immediate  applications  in  strategic  planning 
under  uncertainty.  A  natural  question  that  arises  in  this  setting  is  how  information  quantity 
(as  determined  by  the  quality  or  number  and/or  sensors)  impacts  decision  quality. 

In  this  framework  of  a  single  decision,  we  measure  the  performance  of  the  agent’s  decision, 
given  by  its  state  x(l)  at  time  t  =  1  (x(0)  will  be  irrelevant).  X  is  the  set  of  decisions  available 
to  the  agent  so  that  x(l)  G  X,  and  l(x,W )  is  the  cost  (performance)  of  a  decision  x  €  X 
where  the  additional  argument  W  (the  state  of  the  world)  acts  as  a  random  perturbation  of 
the  decision’s  quality.  The  agent’s  private  information  Y  about  W  dictates  the  action.  The 
measure  of  performance  for  information  governed  by  the  distribution  Pwy  is 


JfpwY )  —  E 


min  E[l(x(l),W)\Y] 

x(i)ex 
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We  continue  the  general  setting  stated  previously  in  a  performance-centric  problem  of 
high  importance:  shortest  path  optimization  on  a  graph.  In  this  problem,  the  decision  set 
X  is  the  set  of  possible  paths  in  a  directed  acyclic  graph  Q  =  (V,  E )  with  vertices  V  and 
edges  E,  and  each  edge  e  G  E  has  a  random  edge  weight  We. 

Despite  the  computation  of  J(pwy )  being  NP-Hard,  we  can  leverage  the  geometric 
properties  of  the  shortest-path  polytope  X  of  Q  and  the  properties  of  V(c)  to  obtain 
fundamental  performance  bounds  as  well  as  simple  characterizations  of  the  actionable  in¬ 
formation  [?].  First,  rather  than  applying  a  traditional  Shannon  channel  capacity  pa¬ 
rameterization  of  imperfect  information,  we  adopt  a  different  representation.  Specifically, 
we  use  sets  V(c)  parameterized  by  a  scalar  c,  which  we  still  term  capacity,  according  to 
V(c)  =  {pwY  I  VAR  [,E[W|y]]  <  c}.  The  optimizing  distribution  p*WY  G  V(c)  as  well  as  the 
best  achievable  performance  from  a  c-amount  of  information  is  determined  by  the  solution 
to 

J(c)  =  min  J(pwy )•  (3) 

PWY&'Pic) 

Using  this  definition  for  information  allows  us  to  quantify  the  value  of  information  ac¬ 
cording  to  the  following  theorem  [?]. 

Theorem  4  (Value  of  information)  A  lower  bound  for  shortest  path  optimization  perfor¬ 
mance  under  capacity  c  is 

Ac)  >  A o)  -  \-fc, 

where  d  is  the  diameter  of  X .  The  bound  is  “sharp”  if  E[W]  =  0  and  C  <  V AR,[We]  for  all 
e  e  E. 

In  short,  the  fastest  rate  of  improvement  for  shortest  path  optimization  is  the  square 
root  of  the  capacity  c  3.  We  can  further  leverage  the  geometry  of  X  to  characterize  the  set 
actionable  information  that  should  be  communicated  to  the  agent. 

Theorem  5  (Actionable  information)  The  actionable  component  of  the  information  vec¬ 
tor  W  is  that  which  lies  in  the  smallest  subspace  containing  X . 

The  theorem  tell  us  information  contained  in  this  subspace  is  all  that  is  needed  to  choose 
the  optimal  path.  The  orthogonal  component  only  improves  estimation  power,  which  is 
irrelevant  to  the  agent’s  objective.  In  fact,  as  long  as  the  variance  of  the  estimate  in  the 
actionable  subspace  is  c,  the  estimate  itself  can  be  arbitrarily  bad,  and  the  agent  can  still 
achieve  the  same  performance. 

In  [?],  it  is  shown  that  a  practical  scheme  for  concentrating  the  information  vector  to  its 
actionable  component  is  to  simply  compare  two  paths  of  the  graph.  In  the  Gaussian  case, 
this  choice  provably  provides  good  performance. 

3We  can  extend  the  same  bound  to  any  linear,  combinatorial  problem  like  shortest  path  optimization, 
and,  further,  so  long  as  the  original  combinatorial  problem  can  be  solved  in  polynomial  time,  the  lower 
bound  can  be  computed  in  polynomial  time  (since  d  must  be  computed).  The  bound  also  holds  for  affine 
costs  of  the  form  l(x,  W)  =  g(x)  +  xTW. 
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Theorem  6  If  we  restrict  pwy  £  -P,(C)  where 

V'(c)  =  V{c)  P] {'Pwy |  W  and  Y  are  jointly  Gaussian}, 

then 

where  the  optimizing  p\VY  G  V(c)  places  all  sensors  along  two  paths  of  the  graph. 

This  performance-centric,  information-energy  setting  can  further  be  extended  to  a  quasi¬ 
dynamic  setting  where  information  is  gradually  revealed  to  the  agents.  If  information  is 
randomly  broadcast  to  the  agent  over  time  with  no  consideration  to  the  agent’s  past  decision, 
one  can  show  that  J(c)  >  J(0)  —  \\J c  —  A  where  A  is  related  to  the  co- variance  of  the 
information  and  how  the  dimensionality  of  X  is  reduced  as  decisions  are  made  (which  impacts 
how  information  energy  must  be  concentrated),  ft  can  also  be  shown  that  the  fundamental 
limit  can  be  improved  if  future  information  accounts  for  the  agent’s  past  decisions. 

We  can  immediately  generalize  the  single-agent  shortest  path  problem  to  multi-agent 
network  flow  optimization.  In  this  problem,  we  have  R  agents  who  seek  to  traverse  a  graph 
along  paths  of  minimal  length.  However,  the  length  of  each  edge  is  determined  by  both  a 
random  length  W  as  well  as  congestion  due  to  multiple  agents  trying  to  access  the  edge 
simultaneously. 

Formally,  we  extend  the  decision  set  to  XR,  where  X  is  the  set  of  paths  in  the  graph, 
and  XR  is  the  set  of  paths  that  each  of  the  possible  R  agents  can  take.  A  natural  definition 
for  the  performance  of  the  distributed  system  is  the  cumulative  length  taken  by  all  agents 
through  the  graph,  denoted  l(XR,W). 

As  in  the  shortest  path  case,  determining  the  value  of  information  is  computationally 
hard,  but  we  can  derive  a  fundamental  bound  for  the  value  of  information  by  leveraging 
geometric  properties  of  X.  This  bound  is  given  by  the  following  theorem  [?]. 

Theorem  7  (Value  of  information  in  network  flow  optimization)  A  lower  bound  for 
network  flow  optimization  among  cooperative,  distributed  agents  under  capacity  c  is  J(c)  > 
J(0)  —  0(c). 

Simulations  verify  that  performance  can  improve  linearly  in  the  case  of  i.i.d.  Gaussian 
edge  weights  and  c  <  VAR  [We]  •  Now,  unlike  shortest  path  optimization,  the  set  of  actionable 
information  is  no  longer  the  subspace  containing  X  because  the  relative  number  of  agents  to 
the  variances  of  the  individual  edges  impacts  performance  4.  However,  it  is  possible  to  show 
that  performance  does  improve  at  the  rate  y A  if  information  is  concentrated  to  at  most  two 
paths,  consistent  with  shortest  path  optimization. 

3  Dynamic  Decision  Networks  under  Communication  Constraints 

We  now  consider  a  dynamic  setting  where  stability  and  robustness  must  be  considered  in 
addition  to  performance.  We  will  see  that  graph  topology,  information  rate,  and  the  rate  at 
which  agents  respond  to  information  can  all  significantly  influence  these  important  factors. 

4This  is  due,  in  part,  to  the  fact  that  the  l(XR,  W)  is  a  quadratic  function,  not  a  linear  function  as  in 
shortest  path  optimization 
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3.1  Stability  of  Capacity-Constrained  Computation  Network 

Let  us  assume  that  an  efficient  distributed  iterative  algorithm  exists  for  the  computation  of  a 
certain  function  on  a  network  whose  links  are  assumed  to  support  the  noiseless  transmission  of 
real- valued  messages.  Important  examples  include  iterative  distributed  averaging  algorithms, 
as  well  as  belief  propagation  algorithms  for  the  computation  of  marginal  distributions  in 
graphical  models.  When  dealing  with  the  finite-capacity,  and  possibly  noisy,  communication 
link  setting  outlined  in  the  previous  section,  a  natural  approach  consists  in  trying  to  adapt 
such  algorithms  in  order  to  cope  with  imperfect  information  transmission  on  the  channels. 
Provided  that  the  original  iterative  algorithm  can  be  seen  as  a  contraction  in  some  metric 
space,  it  is  not  hard  to  see  that  the  addition  of  bounded  noise  in  each  iteration  would  result  in 
the  accumulation  of  some  bounded  noise,  whose  magnitude  can  be  controlled  by  controlling 
the  magnitude  of  the  noise  introduced  in  each  iteration. 

However,  typically  distributed  algorithms  are  not  contractions  globally,  but  rather  ex¬ 
hibit  some  marginally  stable,  or  unstable  subspaces.  As  an  example,  typical  iterative  dis¬ 
tributed  averaging  algorithms  can  be  represented  as  multiplication  by  some  irreducible  dou¬ 
bly  stochastic  matrix  P.  Such  matrices  have  a  one- dimensional  marginally  stable  eigenspace 
generated  by  the  all-1  vector,  and  are  contractions  when  restricted  as  operators  to  the  orthog¬ 
onal  subspace  to  such  eigenspace.  When  the  noisy  channel  transmission  allows  for  perfect 
feedback,  whereby  every  node  has  knowledge  of  the  corrupted  message  that  its  neighbors  re¬ 
ceive  from  itself  (a  particular  case  of  this  is  quantized  transmission),  then  it  can  compensate 
for  the  noise  by  subtracting  it  from  its  current  state.  Formally,  if  ay(f)  is  agent  i’s  state  at 
time  t,  and  ml3{t)  is  f  s  estimate  of  ay(f),  then  each  node  simultaneously  updates  its  state 
as 

Xi(t  +  1)  =  ~  X]  -  ■ 

3  3 

This  update  rule  is  such  that  the  average  of  the  states  n _1  JA  xi(t)  is  preserved,  so  that  noise 
acts  only  on  the  subspace  orthogonal  to  the  all-1  vector,  where  the  algorithms  is  contractive. 
Indeed,  one  has  the  following  result  [?]. 

Theorem  8  (Effect  of  information  concentration  on  stability)  Let  p  be  the  essential 
spectral  radius  of  P,  i.e.  the  second  largest  modulus  of  eigenvalues.  Assume  that  Efxjft)  — 
mij(t ))2]  <  a2  for  all  i,j  and  t  >  0.  Let  x  =  n~l  £Ax,-(0).  Then 

2 

limsup  Y'  E  Uxi(t )  -x)2]  <  a 

t^+ oo  i  1  -  P 

3.2  Stability  and  Performance  of  Capacity-Constrained  Feedback  Control 

One  of  the  most  powerful  results  capturing  performance  trade-offs  in  a  stable  feedback  system 
is  the  Bode  integral  formula  [?].  In  this  classical  result,  it  can  be  shown  that  for  any  strictly 
proper  LTI  plant  P  with  unstable  poles  {A*};,  the  transfer  function  S(z )  =  ^  between  the 
disturbance  d  and  the  input  e  to  P  (also  known  as  the  sensitivity  function),  must  satisfy  the 
constraint: 

7^  f  [log  | S(eJIJJ)\]-duj  +  ^  J  [log\S(eJ0J)\]+du  =  ^  max{0,  log(|Ai|)}  (4) 
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where  [log  |S'(e-7^)|]_  =  min{0,  log  |-S'(eJ’u')|}  and  [log  |<S'(e-7"w)|]+  =  max{0,  log  |S'(ejaj)|}. 

The  constraint  implies  that  the  sensitivity  cannot  be  small  at  all  frequencies,  i.e.,  a 
reduction  in  [log  jS^e^) \\-du  is  achieved  at  the  expense  of  increasing  jfn [log  \S(e^u)  |]+rfu;. 

A  natural  question  in  this  setting  is:  Under  what  conditions  can  we  break  the  Bode 
Integral  Formula  or  must  it  always  hold?  We  addressed  this  question  in  [?]  by  analyzing  the 
information  dynamics  of  a  feedback  system  using  an  entropy-flow  analysis.  First,  we  take  a 
information-theoretic  view  of  the  disturbance  d  and  controller  signal  e  as  corrupted  messages 
from  K  and  P  and  analyze  how  restrictions  on  {K ,  P}  £  P  defined  by  a  set  P  of  allowable 
systems  impacts  the  information  content  of  these  messages. 

Specifically,  let  d{t)  =  W  [t )  be  a  dynamic  state-of-the-world  acting  as  a  disturbance  on 
P,  and  let  e  =  d  —  uikp •  Further  define 

M.  =  {{mpK,  rriKp}\rnpK  is  a  linear  function  of  P’s  internal  state} 

Pc  =  {{K,  P}\K  is  causal  and  P  is  linear,  controllable}. 

We  represent  causality  in  the  entropy-flow  analysis  by  the  flow  constraint 

I  (d(t);  (Ptu,  P0x)\Pt_id)  =  0  Vt  >  0 

where  Pt  is  the  projection  operator  defined  as  Pta  =  (a(0), . . . ,  a(t),  0, . . .)  for  any  signal 
a  =  (a(f))t.  Essentially,  the  constraint  implies  that  the  controller  provides  no  information 
about  future  signals  given  the  past. 

The  assumption  of  causality  immediately  yields  an  interesting  performance  limitation: 
the  entropy  of  the  input  e  into  P  cannot  be  decreased  below  the  external  entropy  injected 
into  the  system,  formalized  by  the  following  theorem. 

Theorem  9  For  any  {K ,  P}  £  Pc  and  {mpK,  rnKp}  £  M.,  h(e(t))  >  h(d(t))  +  J(a;(0);  Pte) 
for  all  t  >  0. 

This  limitation  is  independent  of  stability  and  the  function  of  K  (linear,  non-linear, 
finite  alphabet,  etc...).  It  is  also  the  basis  for  other  performance  limitations  yielded  through 
applying  additional  assumptions  K.  In  particular,  define 

Tcs  =  {{K,  P}\K  is  causual,  stabilizing  and  P  is  linear,  controllable}. 

We  can  represent  stability  in  the  entropy  flow  framework  as  a  constraint  on  the  variance  of 
the  state: 

sup  E  [xT(t)x(t)]  <  oo. 

t 

Also  assume  that  d  and  e  are  asymptotically  stationary  stochastic  signals  (the  weakest  as¬ 
sumption  under  which  they  have  power  spectral  densities).  Let  F\ }  and  Fe  be  the  respective 

power  spectral  densities  of  d  and  e,  and  define  S(u)  =  \J Fe(u) / Fd(uj),  a  direct  generalization 
of  the  sensitivity  function  to  a  stochastic  setting.  Linder  these  mild  assumptions,  we  get  the 
following  theorem. 

Theorem  10  (Fundamental  limit  of  causal  feedback)  For  any  {K,  P}  £  Tcs 

and  {mpK,iTiKp}  G  A4,  the  Bode  Integral  Formula  holds. 
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Figure  1:  Shannon-Bode  Tradeoffs:  White  Area  Depends  on  Information  Rate 

The  most  interesting  aspect  of  this  theorem  is  that  the  form  of  K  (linear,  non-linear, 
lookup  table,  finite  alphabet  output,  etc...)  is  irrelevant,  and  thus  it  yields  a  fundamental 
limit  for  any  feedback  system.  The  information  theoretic  analysis  used  to  re-derive  these 
limitations  allowed  us  to  derive  performance  limitations  that  cannot  be  addressed  using 
traditional  analysis  techniques  [?]. 

Theorem  11  (Generalized  Bode  Integral)  If  {Ii,P}  G  Tcs  and  if  A4  is  constrained  so 
that  mPK  passes  through  a  noisy  digital  memoryless  channel  with  capacity  c,  then 


The  bound  is  tight  for  certain  Gaussian  channels.  By  means  of  an  argument  similar 
to  the  water-bed  effect,  the  inequality  (??)  asserts  a  limitation  on  the  maximum  allowable 
disturbance  rejection  over  any  given  bandwidth  Aca  in  terms  of  the  unstable  modes  of  P 
and  the  information  rate  c.  If  the  information  rate  is  unconstrained  (c  =  oo),  there  is  no 
limitation  on  attenuation  other  than,  of  course,  the  classical  Bode  Integral  Formula. 

Another  information-theoretic  analysis  yielded  a  new  generalization  of  the  Bode  Integral 
Formula  to  a  case  where  it  can  be  “broken”  by  giving  the  controller  limited  access  to  d  by 
means  of  an  early  warning  system  [?].  In  this  context,  d  is  now  assumed  to  be  a  filtered 
white  noise  process  w  passing  through  a  shaping  filter  G,  and  it  is  assumed  that  it  takes 
m  >  0  time  units  to  reach  P.  The  early  warning  system  has  access  to  d  without  the  delay 
and  uses  a  channel  (wireless,  for  example)  to  send  d  to  K .  In  our  general  framework,  this  is 
equivalent  to  K  having  some  observation  Yjcft)  of  d(t).  The  new  fundamental  performance 
limitation  we  derived  described  in  the  following  theorem. 

Theorem  12  (Value  of  lookahead  information)  If  {K,P}  G  Tcs,  {mpK,rnKp}  G  Ai, 
and  if  K  has  access  to  an  early-warning  system  with  capacity  c  (that  is,  /(Y/^t);  d(t))  <  c), 
then 


—  7T 


—  7T 


log  \S(e3u)\du  >  y^max{0,  log(|Aj|)}  -  c 


(6) 


In  the  case  of  an  additive  white  Gaussian  d  and  m  >  l,  the  bound  is  tight. 
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A  direct  interpretation  of  this  limitation  is  that  disturbance  rejection  improved  linearly 
with  information  rate. 


3.3  Stability  and  Performance  in  Congestion  Games  with  Learning 

As  transportation  demand  is  fast  approaching  its  infrastructure  capacity,  social  planning 
for  efficient  usage  of  transportation  networks  (TNs)  is  attracting  renewed  research  interest. 
Recent  technological  advancements  are  making  available  intelligent  traveler  information  sys¬ 
tems  which  have  the  capability  to  provide  real-time,  location  specific  traffic  information  and 
recommendations  to  the  drivers  and  thereby  enabling  them  to  re-plan  their  routes  during 
their  trip.  Transportation  networks  provide  a  prototypical  example  of  distributed  decision 
networks.  Moreover,  their  analysis  and  control  is  made  particularly  challenging  by  the  fact 
that  they  involve  huge  numbers  of  self-interested,  bounded-rational  agents  whose  behavior 
may  be  controlled  only  indirectly  through  incentives.  In  fact,  transportation  networks  have 
been  investigated  in  the  economics  literature,  most  notably  in  the  context  of  (learning  and 
evolution  in)  congestion  games.  However,  these  approaches  tend  to  neglect  most  of  the 
physical  aspects  of  traffic  dynamics,  and  are  therefore  unable  to  explain,  e.g.,  transient  be¬ 
haviors  occurring  in  response  to  sudden,  and  possibly  disruptive,  changes  in  the  network 
characteristics. 

In  the  works  [?,  ?],  we  proposed  a  novel  framework  for  the  analysis  of  stability  and 
robustness  properties  of  traffic  networks.  In  our  model,  we  abstract  the  topology  of  the 
transportation  network  by  a  directed  acyclic  graph  Q  =  ( V,  £ ),  in  which  each  directed  edge 
e  =  (v,v')  represents  a  road,  and  each  node  v  G  V  represents  a  junction.  We  assume  that 
there  is  a  single  origin-destination  pair,  va,  vrj  G  E,  with  a  constant  unitary  incoming  flow  in 
v0,  and  that  the  size  of  the  drivers  population  is  so  large  to  be  efficiently  approximated  by 
a  continuum  of  agents.  Then,  we  consider  a  system  whose  state  consists  in:  a  probability 
vector  7T (t)  =  {7 Tp(t)  :  p  G  V}  over  the  set  V  of  simple  paths  from  origin  to  destination, 
which  takes  into  account  the  fraction  of  agents  preferring  a  path  with  respect  to  the  others; 
and  a  vector  p(t)  =  (pe(t)  :  e  G  £}  whose  components  correspond  to  the  car  density  on  the 
different  roads.  The  car  density  vector  p(t)  evolves  as  the  drivers,  modeled  as  boundedly 
rational  agents,  navigate  their  way  through  the  network  by  combining  their  preference  toward 
the  different  paths  with  the  observation  of  the  current  local  congestion  levels  in  the  network. 
On  the  other  hand,  the  vector  tt (t)  evolves  as  the  agents  adapt  their  preferences  toward  the 
different  paths  using  global  information  on  the  current  congestion  levels  on  the  network.  We 
assume  that  such  global  information  is  available  at  a  time  scale  much  slower  than  the  typical 
time  scale  at  which  the  actual  drivers’  dynamics  occur.  In  this  way,  the  dynamics  of  tt (t)  and 
p(t)  becomes  intertwined  through  two  feedback  loops,  each  involving  a  significantly  different 
kind  of  information:  local  information  at  a  fast  scale,  and  global  information  at  a  slow  scale. 
The  system  dynamics  is  then  described  by  a  system  of  ordinary  differential  equations 


—pe  =  /,,  Ge(pv,  tt)-  fe,  e  G  S  , 

^7Tp  =  T)  (Fp(p)  -7 Tp)  ,  per, 
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where  fe  =  fe(pe)  is  the  flow  on  link  e,  modeled  as  an  increasing  function  of  the  traffic 
density  with  a  maximum  flow  capacity  Ce;  :=  1  if  v  —  va,  X~  =  'Yhe&£~  /e  if  n  7^  va ,  is 
the  incoming  flow  in  node  v,  pv  =  {pe  '■  e  G  £fl }  is  the  vector  of  traffic  densities  on  the 
out-going  edges  of  node  v;  Ge(pv,n)  is  the  fraction  of  drivers  taking  edge  e  when  crossing 
node  v,  when  the  local  density  is  pv,  and  the  path  preferences  profile  is  7r;  77  >  0  is  the  ratio 
(typically  small)  between  the  characteristic  times  of  the  fast  and  slow  dynamics;  F(p)  = 

min^.  |X]P  The&p  Kpte(Pe)  +  /3H(7r)  j  is  the  noisy  best  response  to  the  current  traffic  density 

on  the  network,  with  fl  >  0,  H(7t)  a  convex  function,  and  te(pe)  the  average  delay  on  edge  e. 

A  practical  scenario  to  help  envision  this  setup  is  where  every  driver  is  equipped  with  a 
smart  navigation  unit  that  recommends  a  direction  to  the  driver  based  on  its  computations  on 
global  traffic  information.  Drivers  augment  this  recommendation  with  the  local  information 
to  navigate  her  way  through  the  network.  The  resources  to  collect  global  information  and 
compute  optimal  paths  scale  with  the  size  of  the  network,  and  hence  it  is  reasonable  to 
expect  that  the  navigation  units  will  update  their  recommendations  relatively  infrequently 
as  compared  to  typical  transit  times  for  large  networks,  and  that  the  drivers  are  aware  of  this 
latency.  Therefore  the  traffic  dynamics  are  significantly  influenced  by  the  drivers  response 
to  local  information. 

In  [?],  we  analyze  the  stability  properties  of  this  dynamical  system.  Under  very  mild 
assumptions  on  the  drivers  behavior,  we  show  that  system  convergences  to  a  neighborhood 
of  the  Wardrop  equilibrium.  The  latter  is  a  well  known  notion  of  equilibrium  configuration 
characterized  by  equal  expected  delay  on  every  path  from  source  to  destination  which  is 
effectively  chosen  by  some  agent,  whereby  no  agent  has  any  incentive  to  switch.  Formally, 

P*  :=  {Pe  :  e  G  £}  is  a  Wardrop  equilibrium  if  Yfee£~  /*  =  Ylee£+  fe  f°r  v  f  {vo,Vd},  and 

p*e>  0,  VeGp  =►  ^te(p*e)  <  J^te(Pe) ,  Vg  G  V  . 

e£p  e£q 

Wardrop  equilibria  have  been  the  object  of  big  research  effort,  especially  in  relationship 
to  their  inefficiency  in  terms  of  social  optimum,  and  the  possibility  to  stir  the  Wardrop 
equilibrium  towards  a  a  more  socially  efficient  configuration  through  the  use  of  tolls.  In  [?], 
it  was  shown  that  the  asymptotic  distance  from  the  Wardrop  equilibrium  is  controlled  by 
both  the  time-scale  ratio  r)  and  the  noise  level  /3: 

Theorem  13  (Impact  of  information  and  control  on  convergence)  Let  p*  be  the  unique 
Wardrop  equilibrium  of  the  network  and  f*  =  {pe(p*)  :  e  G  £}  the  corresponding  flow  vector. 
Then, 

lim sup  || fit)  -  f*\\  <  K(r)  +  fl) , 

t — 

for  some  positive  constant  K . 

In  [?],  we  study  robustness  properties  of  the  system  to  sudden  disruptions.  Such  disrup¬ 
tions  are  modeled  as  drastic  changes  in  the  physical  properties  of  some  of  the  links,  which 
decrease  (and  possibly  annihilate)  their  flow  capacity.  We  then  look  at  the  evolution  of  the 
system  assuming  that  it  is  started  at  the  Wardrop  equilibrium  of  the  unperturbed  system, 
and  that  the  agents’  global  preference  toward  the  different  paths  does  not  change  significantly 
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(this  is  because  n (t)  evolves  at  a  time  scale  too  slow  for  being  significant  in  the  presence  of 
sudden  disruptions).  We  characterize  the  margin  of  stability  7  of  the  original  equilibrium, 
i.e.  the  minimum  total  loss  in  flow  capacity  that  makes  the  system  unstable,  as  the  minimum 
node  cut  of  the  network. 

Theorem  14  (Robustness  to  graph  perturbations)  7  =  min„^o  &£+  Ce  ft  ■ 

This  quantification  of  the  margin  of  stability  has  to  be  contrasted  to  the  min-cut  capacity 
of  the  network.  Indeed  the  former  is  equilibrium-dependent,  always  not  larger  than  the 
latter,  which  is  equilibrium  independent,  and  in  fact  typically  much  smaller  than  it.  Such 
a  gap  between  the  two  is  in  fact  a  consequence  of  the  locality  of  information  available  to 
the  agents.  The  margin  of  stability  provides  a  second  order  optimization  parameter  for  the 
optimal  choice  of  tolls. 
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